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Abstract 

The problem of defining a statistical ensemble of random graphs 



T3 

q ■ with an arbitrary connectivity distribution is discussed. Introducing 

i CJ | , such an ensemble is a step towards uderstanding the geometry of wide 

classes of graphs independently of any specific model. This research 
^ ■ was triggered by the recent interest in the so-called scale- free networks. 
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rj!} '. 1 Introduction 

This is a workshop talk and therefore I do not hesitate to report about partial 
results of a research still in progress. I shall also submit you a couple of 
queries, with the hope of attracting your interest and triggering a discussion. 
I have benefited from collaboration with Z. Burda, the late J.D. Correia and 
J. Jurkiewicz (cf ref. and papers quoted therein). 
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Let me recall that a graph is just a collection of vertices (nodes) and links 
(edges) connecting vertices. It is a mathematical idealization representing 
various networks one encounters in nature, in social life, in engineering, etc. 
For example, the web can be represented by a graph: the vertices are the 
URLs and the links are the hyperlinks. Likewise, the network of sexual 
relations in a population can be represented by a graph. The study of its 
geometry has some interest for epidemiology. In these examples, as in many 
other ones, the pattern of connections between vertices is fairly random. The 
concept of a random graph emerges quite naturally. For defmiteness, I shall 
consider graphs with undirected links only. 

When one is talking about random graphs, one has of course in mind a 
statistical ensemble of graphs. How to define such an ensemble? The simplest 
answer is given in the framework of the classical model developed by Erdos, 
Renyi and their followers || : in a set of TV vertices one connects at random L 
out of N(N — l)/2 possible pairs of vertices. All possible graphs constructed 
that way form the ensemble in question. The probability p to connect a pair 
of vertices is the control parameter of the model. The geometry of graphs 
changes in a very interesting and by now fully understood manner when p 
changes. However, in this ensemble the distribution of connectivity (vertex 
degree) is always Poissonian. 

It turns out that connectivity distributions very different from Poissonian 
are observed in a variety of observed networks. In particular, in a number 
of interesting networks this distribution has a tail falling like a power of the 
vertex degree. These networks have been baptized scale-free by Barabasi and 
Albert ||. The properties of scale- free networks are commonly discussed in 
the framework of simple growth models (where the connectivity distribution 
becomes stationary and scale- free at large time). These models are invaluable 
for illustrating basic dynamical mechanisms, like the preferential attachment 
rule. However, they are not fully realistic. For a variety of reasons one would 
like to understand the generic geometries of wide classes of graphs. This 
can be presumably better achieved by defining consistently the correspond- 
ing statistical ensembles, instead of producing more and more complicated 
growing network models. 

The aim of this talk is to discuss problems one encounters trying to define 
a statistical ensemble of random graphs with an a priori given connectivity 
distribution. The definition can be more or less formal. It can be implicit, 
reducing to the formulation of an algorithm enabling one to sample graphs, 
for example with the help of a computer. 
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2 The Molloy-Reed construction 



Let p n denote here the connectivity distribution. In ref. [|J Molloy and Reed 
propose a specific method of constructing graphs with a given p n . They 
proceed in two steps: 

(a) First, iV auxiliary graphs are created. The number of links of an 
auxiliary graph is randomly generated from the probability distribution p n . 
By construction all these links meet at a common vertex and have the other 
end free. The number of free-end links in the full set of auxiliary graphs must 
be even, otherwise one restarts the construction. 

(b) Second, in the full set of iV auxiliary graphs the successive pairs 
of free link ends are picked at random and connected, until no free link end 
remains. 

In this manner, one creates a single graph with vertex degrees nj, ri2, ...nzv- 
Notice, that the number of links of that graph L = 1/2 J2j n j is not kept fixed. 
In the ensemble of graphs it does fluctuate around the average value l/2N(n), 
where (...) = J2 n ■■■ Pn- This is perhaps a weak point of the construction, since 
L/N is a sensitive parameter in graph theory. On the other hand, it is very 
pleasant that the connectivity distribution matches p n for individual graphs. 

Notice also that these graphs are, in general, not connected. Further- 
more, they are, in general, "degenerate": there may be multiple connections 
between vertices and certain links may connect a vertex to itself []. For a given 
set ni,ri2, ...un a non-degenerate graph may simply not exist. Moreover, en- 
forcing non-degeneracy, when it is possible, introduces a bias. Although in 
each graph the connectivity distribution matches p n up to fluctuations, signif- 
icant deviations from p n can appear, when the distribution is calculated for a 
large ensemble of graphs, if certain fluctuations are systematically favoured. 
This remark is particularly pertinent to the case of scale-free graphs, where 
the connectivity distribution has a long tail, subject to important fluctua- 
tions. 



1 I met the opinion that in this construction the degenerate graphs become unimportant 
in the limit N — » oo. This is false. It is easy to count graphs. When N — > oo and x — L/N 
is kept fixed, the non-degenerate graphs are a finite fraction, viz. exp [— 2x(l -I- a;)], of all 
possible graphs. 
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3 Minifield theory: random graphs and Feyn- 
man diagrams 



The minifield theory is defined by the following formal integral 



Z ~ / rf^exp -[-0 2 /2A + 5> n n ] 



(1) 



where the integration variable is a real number, k, \,pi > and p n > 
for n > 1. Although, strictly speaking, the integral does not exist, the 
perturbative expansion of Z in powers of the "couplings" p n is well defined. 
As in field theory, the individual terms of the expansion can be represented by 
Feynman diagrams. The "propagator" equals A, k plays the role of the Planck 
constant and p\ that of an "external current" (a pedagogical presentation for 
people not very familiar with field theory methods can be found in 0] ) • 

The idea is to identify the Feynman diagrams of this toy model with 
the graphs of a statistical ensemble. Indeed, the Feynman diagrams of the 
minifield theory are the graphs familiar to people working on networks, except 
that there is a specific weight - the " Feynman amplitude" - attached to each 
graph. In the " semiclassical limit" k — > only tree graphs survive and the 
model is exactly solvable. 

According to the Feynman rules, the weight of a non-degenerate graph 
with N vertices and L links is 



In the presence of degeneracies one has to multiply the rhs by the standard 
symmetry factors. Actually, the construction of Feynman diagrams does not 
differ from the construction of graphs following the Molloy-Reed recipe. Here, 
the auxiliary graphs are those defined by the "interactions" p n <fi n - However, 
the weight factor n L ~ N \ L /N\ does not appear there; the fluctuations of L 
result from fluctuations of the generated vertex degrees. In contrast, we 
introduce here a specific fugacity of links A and a parameter, k, controlling 
the number of loops in connected components. 

The following Metropolis algorithm generates graphs with fixed N and 
L: one picks a random link ij and a random vertex k ^ i,j and one rewires 
ij — > ik with probability 



weight = k 



.L-N 




(2) 



P, 



rewire 



(n k + l)R{n k + l)/njR(nj) 



(3) 
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when the rhs above is less than unity, and with probability equal to one 
otherwise. Here R(n) = VnlVn-x- When rij = 1, the attempt is rejected, 
so that vertices with zero connectivity are never created. The rhs of (|3]) 
follows from (fj) and the detailed balance condition. It turns out, that this 
last condition insures that the symmetry factors in the weights of degenerate 
graphs come out correctly too. 

The presence of the factor (n^ + l)/n.j on the rhs of (D) means that the 
rewired vertices are sampled independently of their degree. Furthermore, the 
rewiring depends on the vertex degrees only and is insensitive to the rest of 
the underlying graph structure. Hence, as far as the distribution of vertex 
degrees is concerned, the model is isomorphic to the well known balls-in-boxes 
model ||, defined by the partition function 

N 

z~Y,P(ni)~'PMS(M- y £n j ) (4) 

n i 3=1 

and describing M balls distributed with probability p n among iV boxes (in 
our case M = 2L). The constraint represented by the Kronecker delta on the 
rhs of (d) is satisfied "for free" when N — > oo by virtue of Khintchin's law of 
large numbers, provided (n) < oo and M/N = (n). When the last condition 
is met the occupation number distribution of a single box is just p n . 

Consequently, in the statistical ensemble including degenerate graphs 
the connectivity distribution is p n provided the number of links is set to 
L = l/2N(n) (notice, that it is the average number of links in the Molloy- 
Reed construction). It is easy to calculate the number of such graphs for fixed 
L/N . It increases with N like exp [constiV log N] , the ensemble is overex- 
tensive Q. Hence, it is not guaranteed that the connectivity distribution is 
p n for individual graphs, it is so when one averages over the ensemble. This 
should not be a serious flaw in applications. 

The algorithm works also very well for trees. It suffices to start with 
a tree graph, for example with a polyline, and impose the constraint that 
Hi = 1. Then, all successively generated graphs are also trees. As already 
mentioned, the model is analytically solvable when one limits one's attention 
to tree graphs. One can show exactly that in this case the connectivity 
distribution is ~ np n 0. Hence, in order to get an a priori given connectivity 
distribution P n one should set the couplings of the tree model to p n ~ P n / n - 

2 The ensemble of non-degenerate graphs is overextensive too (cf the footnote on p. 3); 
it becomes extensive in the limit k — > 0, ie for tree graphs. 

3 The following heuristic argument can help to understand that: trees can always be 
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A fairly comprehensive discussion of the ensemble of random tree graphs 
is presented in 0, with emphasis on the hot problem of scale-free graphs. I 
shall not enter into this discussion here, apart from the few words to follow. 
The partition function ([]]) can be calculated in the saddle point ("semi- 
classical" ) approximation. The saddle point condition, identical to a familiar 
equation in polymer physics, is a starting point for further calculations. In 
particular, one can find the fractal dimension du of the tree graphs. This 
was first done in || for the so-called generic case, with the result d# = 2. 
For scale-free graphs the connectivity distribution falls like and one finds 
J7|, [ij in the empirically interesting situation 2 < (3 < 3: 

d H = (/3 - l)/(/3 - 2) (5) 

while du = 2 again for f3 > 3. An infinite du is found in the rather special 
case, where a singular vertex with fixed degree of order O(N) is present in 
(almost) all trees of the ensemble []. 

It is very easy to supplement the algoritm with a constraint insuring that 
all produced graphs are non-degenerate. However, this introduces a bias. We 
do not know yet how to choose the input data, ie the couplings p n , in order to 
get at the output a desired connectivity distribution. The problem is solved 
for degenerate graphs and for trees, as stated above, but for non-degenerate 
graphs it remains open: 

Query : What are the minifield theory couplings p n leading to a given 
connectivity distribution in the ensemble of non- degenerate graphs ? 

The ensemble of graphs defined by (|I|) is fairly general, but not the most 
general one: the weight of a graph is a product of factors corresponding to 
individual vertices. One can introduce correlations between neighbor vertices 
replacing (|1|) by 




where <fr — {<Pi, $2, 4>q) and A is some q x q symmetric matrix with positive 
elements. The cut-off q can be eventually sent to infinity (but in order to 

embedded in a plane. They are obtained by gluing successive vertices (auxiliary graphs). 
But each vertex with n links attached to it could have been rotated in the plane up to n 
times before beeing glued to the tree it belongs to and this rotation would not affect the 
result. Consequently, the weight of a vertex is ~ np n instead of p n , because of this specific 
symmetry. 

4 The Cayley tree, a graph with a minimal entropy in our ensemble, also has dn = 00. 
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study the tree content of the model the limit k — > should be taken first). 
This model has not been studied yet: 

Query: What are the properties of the ensemble defined by (||), for rea- 
sonable choices of the correlation inducing matrix A ? Even a study of the 
"semiclassical" limit alone would be of interest. 



4 Growing networks 

Recently, much activity has been devoted to the formulation of growing net- 
work algorithms producing the so-called scale-free graphs (see, for example, 
refs. 0, || |9|). In these models and at large "time" the average connectivity 
becomes stationary, except for the tail where finite size (time!) corrections are 
felt. A repeated use of such a growing network algorithm defines a statistical 
ensemble and, with this strategy, it is not difficult to produce non-degenerate 
graphs only. The connectivity distribution cannot be chosen at will, it has 
a shape specific to the model at hand. But one can usually adjust the pa- 
rameters of the algorithm to control the large vertex degree behavior. The 
major problem with this approach is that it is difficult to decide whether the 
results one obtains are generic or just reflect the specific dynamics of a rather 
simple model. 

Let me illustrate this point with an example in the next section. 



5 Graph diameters 

Consider tree graphs with connectivity distribution 



n n(n + l)(n + 2) v ' 

They can be generated by the Barabasi- Albert growing network recipe [^, || , 
or by the algorithm presented in Sec. 3, provided the couplings are set 
to p n = P n /n. For a given graph let n{r) denote the number of vertices 
separated by geodesic distance r from a randomly chosen "reference" vertex. 
Averaging over the ensemble of graphs one is interested in and over the possi- 
ble choices of the "reference" vertex, one gets a specific "two-point function" 
(n(r)), which can be used to define the average diameter of a graph. All 
this is easily done on a computer. The result, illustrated in Figs [l] and ||, 
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normalized <n(r)> vs r 

0.15 




Figure 1: The normalized two-point function (n(r)) in the statistical ensemble defined 
in |U calculated for the number of nodes N = 100,400,1600,6400. The connectivity 
distribution is given by eq (Q). 

normalized <n(r)> versus r 




Fi gure 2: The normalized two-point function (n(r)) in the ensemble of graphs generated 
by the growing network algorithm proposed by Barabasi- Albert in ||, calculated for the 
number of nodes TV = 100,400,1600,6400. The connectivity distribution is given by eq 

©■ 
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Figure 3: The average size of a graph (r) versus N in the two models. It seen that in 
the Barabasi- Albert model the growth of (r) is logarithmic. 



is that (n(r)) is very different in the two models. In the Barabasi- Albert 
model the graph diameter grows like logiV, while in the model of Sec. 3 it 
grows like a power of N (see Fig. ||). Manifestly, the Barabasi-Albert model 
explores only a fraction of available phase-space. This is simply explained: 
the vertices of highest degree are the oldest ones and tend in this model to 
be close to each other. Consequently, the distance between other vertices 
is also much smaller than in a truly random tree. Another deviation from 
randomness in growing networks was observed earlier by Callaway et al [ TP | . 



Incidentally, it appears that (r) ~ logiV in the ensemble of degenerate 
graphs with the same P n generated by the algorithm of Sec. 3 (see Fig. f|). 
Intuitively it is obvious that the growth of the diameter becomes slower when 
loops can be formed often enough since they produce "shortcuts". 

Actually, the "small world" behavior (r) ~ logiV is found in a large 
variety of networks. I do not know any rigorous derivation of this result in 
a sufficiently general context. In the network community one often refers to 
ref. (|TTJ. Unfortunately, although ref. JTT] is otherwise an interesting paper, 



their derivation of this logarithmic growth is mathematically incorrect. They 
have in mind the Molloy-Reed construction, but they actually consider tree 
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10000 



Figure 4: The average size (r) of the giant component versus N =average #nodes in the 
component for general (degenerate) graphs with loops. N is not very large and finite size 
correction to (r) is important. The connectivity distribution is (0) for the full graph and 



falls also roughly like 



for the giant component. 



graphs with uncorrelated vertex degrees, and for such trees the diameter 
usually grows as a power of N. 

To see the mistake, notice that for a given "reference" vertex one has 



1 + n(l) + n(2) + ... + n{r max ) = N 



(8) 



Newman et al replace all the quantities in (|8|) by their bulk average values. 
However, this is, in general, illegal. With each "reference" point is associated 
a specific sequence n(l),n(2), ... . The conditional probability that n(r) = k 
differs from the bulk probability that a vertex has k r ih -near-neighbors. It 
depends on the sequence leading to n(r). One has to attach probability 
measures to possible sequences in graphs and also to graphs. The problem 
is not trivial but it was solved by Ambj0rn et al || precisely for the class of 
connected tree graphs considered in ref. [ ll]. The result is that generically 
the Haussdorf dimension is finite and therefore the graph diameter grows 
like a power of N, as already mentioned. Hence, I end this talk with another 
query: 

Query: What are the general conditions insuring that the "small world" 
behavior (r) ~ log iV does actually hold as an exact result for N — > oo ? 
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A nice theorem awaits for being formulated and proved! 
I wish to thank Serguei Dorogovtsev for pointing out to me that an ar- 
gument used in the original version of this text is spurious. 
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